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ABSTRACT 

We develop and test implicit methods for unstructured mesh computations. The approximate 
system which arises from the Newton-linearization of the nonlinear evolution operator is solved 
by using the preconditioned GMRES (Generalized Minimum Residual) technique. We investi- 
gate three different preconditioners, namely, the incomplete LU factorization (ILU), block diag- 
onal factorization and the symmetric successive over-relaxation (SSOR). The preconditioners 
have been optimized to have good vectorization properties. We also study SSOR and ILU 
themselves as iterative schemes. The various methods are compared over a wide range of prob- 
lems. Ordering of the unknowns, which affects the convergence of these sparse matrix iterative 
methods, is also investigated. Results are presented for inviscid and turbulent viscous calcula- 
tions on single and multi -element airfoil configurations using globally and adaptively generated 
meshes. 


This research was partially supported under the National Aeronautics and Space Administration under 
NASA Contract No. NAS 1-18605 while the second author was in residence at the Institute for Computer 
Applications in Science and Engineering (ICASE), NASA Langley Research Center, Hampton, VA 23665. 
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1. INTRODUCTION 

Impressive progress has been made in the area of algorithms for unstructured meshes in 
the last few years. Much attention has been focussed on improving the spatial discretization 
operator ([1-3]) which has evolved to a very high degree of sophistication. Usually explicit 
methods, such as Runge-Kutta schemes, have been used to march the solution to steady state. 
Some acceleration techniques such as local time stepping and residual averaging have also been 
implemented in this context. However, for large problems as well as stiff turbulent flow prob- 
lems, the convergence rates of such methods degrade rapidly, resulting in inefficient solution 
techniques. In order to speed up convergence and propagate information more rapidly 
throughout the domain, more sophisticated multigrid or implicit methods are required. 

The unstructured multigrid algorithm of Mavriplis [4] has been shown to produce 
efficient steady-state solutions for both the Euler and Navier-Stokes equations. In this approach, 
convergence acceleration is achieved by time-stepping on coarser unstructured meshes which 
may be generated independently from the fine mesh on which the equations are originally 
discretized. The principle behind this algorithm is that the errors associated with the high fre- 
quencies are annihilated by a carefully chosen smoother (a multi-stage Runge-Kutta scheme) 
while the errors associated with the low frequencies are annihilated on the coarser grids where 
these frequencies manifest themselves as high frequencies. The disadvantage of such an 
approach lies in the fact that the acceleration is achieved through the use of additional 
geometric constructions (i.e. user generated coarse meshes) which is often viewed as less desir- 
able than for example an algebraic multigrid approach. A fully implicit method, wherein the 
system of linear equations is solved by direct methods, was developed and tested by Venkatak- 
rishnan and Barth [5]. While providing a robust solution technique, direct methods are plagued 
by nonoptimal computational complexity and high storage requirements. Furthermore, for non- 
linear systems with inexact linearizations, since the linear system of equations which arises at 
each time step need not be solved to a high degree of precision in order to maintain favorable 
overall (nonlinear) convergence rates, iterative implicit solvers may be employed. 

Iterative implicit methods for unstructured problems have been investigated by Whitaker 
et. al [6], Hassan et al. [7], Struijs et al. [8] and Batina [9]. Venkatakrishnan [10] has tested 
preconditioned iterative methods on structured grid problems with special emphasis on vector 
performance issues. He concluded that some of these methods are quite competitive with other 
existing methods, while being readily applicable to unstructured grids. In this work we extend 
some of the ideas from [10] to unstructured grids. 

Spatial discretization is achieved using piecewise-linear finite-elements. For dissipative 
terms, a blend of Laplacian and biharmonic terms is employed, the Laplacian term acting in the 
vicinity of shocks. The use of this particular discretization affords a relatively simple construc- 
tion of the linear system, while enabling a straight-forward comparison of the implicit schemes 
with the previously developed multigrid strategy. For turbulent flow calculations, the unstruc- 
tured mesh implementation of the Baldwin-Lomax algebraic model developed in [11] is incor- 
porated. This model is not differentiable, and is therefore treated explicitly in the present 
scheme. The implicit methods investigated in this work are not restricted to any scheme in 
particular, and in the future may be applied to more complex upwind discretizations and more 
sophisticated multi-equation turbulence models. 
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2. IMPLICIT SCHEME 

In non-dimensional conservative form, the hill Navier-Stokes equations read 
dw_ tyc_ *gc_ fa/ v 3 gv ] 

dt dx + dy Re_ dx + dy ^ 

where w represents the solution vector (conserved variables), and f c and g c represent the 
Cartesian components of the convective fluxes which are non linear functions of the w vari- 
ables, and /„ and g v are the Cartesian components of the viscous fluxes, which are functions of 
both the w variables, and the first derivatives of the w variables. The variables are stored at the 
vertices of a triangular mesh which is generated from a prescribed distribution of points by 
Delaunay triangulation [4], Details of the spatial discretization using a finite volume scheme 
and its relation to a piecewise-linear finite element method may be found in [4], 

The discretization of the governing equations in space leads to the following system of 
ordinary differential equations: ....... 


M 0 ( 2 ) 

where R represents the spatial discretization operator, or the residual, which vanishes at 
steady-state and M represents the mass matrix, which contains the information relating the 
average value in a control volume to the values at the vertices. Since we are only interested 
here in steady state solutions, the mass matrix can be replaced by the identity matrix yielding 

+ R(w) = 0 (3) 

If the time derivative is replaced by: 


dw _ w '”' 1 ' 1 — w" 
dt A t 


(4) 


then an explicit scheme is obtained by evaluating R(w) at time level n , and an implicit scheme 
by evaluating R (w) at level n+1. In the latter case, linearizing R about time level n, one 
obtains: 




SW, = (W'" +1 - IV"); 


(5) 


Eqn. (5) represents a large nonsymmetric linear system of equations for the updates of the vec- 
tor of unknowns and needs to be solved at each time step. As 8t tends to infinity, the method 

reduces to the standard Newton’s method. The term symbolically represents the implicit 

side upon linearization and involves the Jacobian matrices of the flux vectors. The discretized 
convective fluxes are linearized exactly on the left-hand side of the equation. Only a first order 
accurate representation of the artificial dissipation terms is employed in the linearization on the 
left hand side, due to storage considerations. This results in the graph of the sparse matrix 

being identical to the graph of the supporting unstructured mesh (i.e. every vertex in the 

matrix is connected only to its nearest neighbors). The sparse matrix thus has a symmetric 
structure, even though the matrix itself is not symmetric. Linearization of the complete bihar- 
monic dissipative terms would result in a much denser matrix with a different graph, since 
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each vertex would also be connected to its second to nearest neighbors. The storage require- 
ments for the representation of such a matrix become prohibitive. The penalty in making this 
approximation in the linearization is that we can never approach Newton’s method (with its 
associated quadratic convergence property) due to the mismatch of the right and left hand side 
operators in Eqn. (5). The viscous fluxes are linearized with a few approximations. First, the 
iaminar viscosities, which are computed using Sutherland’s law, are not linearized in the 
energy equation, and the average quantities at the cell centers are approximated as well. The 
validity of these approximations has been established by solving a very low Reynolds number 
laminar flow at very high CFL numbers (non-dimensionalized time steps). Second, the alge- 
braic turbulence model, being nondifferentiable is not linearized and is treated explicitly. 

Since the linear system is itself approximate there is little to be gained by solving it to a 
great precision. To obtain favorable overall (nonlinear) convergence, it has been found that it is 
better to solve the linear problem to a moderate degree of precision and proceed to the next 
time step. However, for stiff problems it may well be necessary to solve the linear problem 
well and one has the control to do so in the present framework. The time step in Eqn. (5) is 
taken to be inversely proportional to the L 2 norm of the residual. Since we have a mismatch of 
operators in Eqn. (5), it is necessary to limit the maximum time step. 

The system of linear equations is solved in the present work by the GMRES technique 
developed by Saad and Schultz [12]. There is a host of iterative methods for solving nonsym- 
metric linear systems. Each of these methods has its own advantages but in the present context 
we shall just employ one: GMRES. Venkatakrishnan [10] compared the Chebychev semi- 
iteration technique to GMRES for structured CFD problems and found GMRES to be margi- 
nally better. Moreover, the choice of a particular iterative technique is not as important as that 
of a good preconditioner, and the better the preconditioner, the more computationally intensive 
it is, diminishing the relative importance of the iterative method. Without a good precondi- 
tioner, most of these iterative methods fail to converge for the kind of stiff problems which 
arise in computational fluid dynamics. 

The GMRES technique is quite efficient for solving sparse nonsymmetric linear systems 
and is outlined below. Let x 0 be an approximate solution of the system 

A x + B = 0 (6) 

where A is an invertible matrix. The solution is advanced from x 0 to x k as 


Xk = x 0 +y k 

GMRES(k) finds the best possible solution for y k over the Krylov subspace < 
vi i Av lf A 2 v 1 ,...A t-1 v 1 > by solving the minimization problem 

\\r k \\ - Min y llvi+Ayll 

V\ = A Xq + B , r k = A x k + B 

GMRES procedure forms an orthogonal basis v, v 2 . v k (termed search directions) spanning 

the Krylov subspace by a modified Gram-Schmidt method. Storage is required to store these 
search directions. As k increases, the storage increases linearly and the number of operations, 
quadratically. To mitigate this, Saad and Schultz also describe GMRES (k,m) which is a res- 
tarted GMRES (k), where the k search directions are discarded and recomputed every m 
cycles. GMRES can also be thought of as an optimal polynomial acceleration scheme. Precon- 
ditioning greatly improves the performance of GMRES as well as the other related iterative 
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methods. It decreases the size of the spectrum so that the optimal polynomial generated by 
GMRES can better annihilate the errors associated with each eigenvalue. 

3. PRECONDITIONING 

Instead of Eqn. (6) the preconditioned iterative methods solve the following systems: 

P A x +P B = 0 (7) 

AQ(Q- l x)+B= 0 (8) I 

The systems of linear equations in Eqn. (7) and Eqn. (8) are referred to respectively as, left 
preconditioned and right preconditioned sys tems and P and Q as left and right preconditioners. j 

The role of the preconditioner is to cluster the eigenvalues around unity. For reasons given in j 

[10] we shall just employ right preconditioning. We have examined three preconditioners, 
namely the incomplete LU factorization, SSOR and block diagonal. We will describe below the j 

preconditioners and the optimizations done to extract the best vector performances out of them. j 

A simple choice is a block diagonal pr econditi oner which computes the inverse of the ' 

4x4 diagonal block associated with a grid point. Good vectorization when using this precondi- 
tioner is easy to achieve by unrolling the LU decomposition of the 4x4 diagonal matrix as well 
as the forward and back solves over all the grid points. A family of preconditioners arises out 
of an incomplete LU factorization and is referred Yo as ILU(n). Here n represents the level of 
fill-in. n=0 implies no fill-in beyond the original nonzero pattern. In the present work ILU(O) is 
used since it is quite robust and has lower storage requirements. It is also possible to cast the 
symmetric successive over-relaxation (SSOR) as a preconditioner as has been shown by Saad 
[14]. Saad recommends setting the relaxation factor to 1 when using SSOR as preconditioner. I 

In this case the SSOR preconditioner looks exactly like the ILU preconditioner, except that the 
lower and the upper factors are read off directly from the matrix A rather than by an incom- 
plete factorization. The incomplete factorization is a nonvectorizable procedure (although paral- 
lelizable by using wavefront ordering described below) and SSOR preconditioning dispenses ! 

with this sequential procedure. We will also test ILU factorization and SSOR as iterative tech- 
niques by themselves for solving the linear sub-problems at each time step. 

4. DATA STRUCTURES I 

I 

In this section we describe the data structures and kernels employed which are critical in l 

reducing memory requirements and obtaining good performance. In the course of the GMRES 
method with preconditioning as per Eqn. (8) we need to address two kernels. 

The first kernel is a sparse matrix - dense vector multiply to compute A x. The most = 

commonly used data structures [15] are not ideal for this purpose since they have poor vectori- 
zation properties. The ITPACK data structure, which allocates storage based on the maximum 
number of nonzeros in a row, is inefficient for sparse matrices arising from unstructured grids, 
because the degree of a vertex is arbitrary. The data structure that we use for storing the 
sparse matrix A is most easily explained by interpreting the underlying triangular mesh as an 
undirected graph. Associated with each edge are the two vertices, say nl and n2, which are - 

incident to the edge. The spatial discretization operator (the right hand side) utilizes this data - 

structure and therefore, this information is already available. We store the two 4x4 matrices 
which contain the influence of n2 on nl (entry in row nl and column n2 in A) and vice versa. 

The diagonal blocks are stored separately. With such a data structure, we can carry out a - 
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matrix vector multiplication efficiently by employing a coloring algorithm to color the edges of 
the original mesh to get vector performance. Note that the data structure deals with blocks of 
4x4 matrices; for a scalar matrix the above mentioned data structure is roughly equivalent to 
the coordinate storage scheme [15]. However, since the graph of the sparse matrix is 
equivalent to that of the supporting unstructured mesh, the matrix is known to have a sym- 
metric structure (although the matrix itself is not symmetric). Hence, we achieve a savings 
with respect to the standard coordinate storage scheme by only storing the coordinates of the 
upper half of the matrix. 

The second kernel deals with the effect of the preconditioner Q on a vector. Q is D~ l for 
block diagonal preconditioning and (L U)~ l for ILU/SSOR preconditioning, where the ~ indi- 
cates approximate factors. The block diagonal case is straight-forward in this aspect and was 
discussed earlier. The ILU/SSOR preconditioners require repeated solutions of sparse triangular 
systems. By using a level scheduling (also known as wavefront ordering) [16,17] it is possible 
to obtain good vector performance. Under this permutation of the matrix, unknowns within a 
wavefront are eliminated simultaneously. The key step in this procedure is an off-diagonal rec- 
tangular matrix - vector multiply. This requires that L and U be stored in a convenient form 
and we choose a data structure similar to that of A. In addition to the nonzero blocks and the 
column numbers which are provided by the factorization, we store the row numbers. With this 
additional information, the data structure becomes similar to the edge-based data structure 
employed for the A matrix except that we only store one block per edge. The off-diagonal 
matrix vector multiply can then be vectorized by interpreting the rectangular matrix as a 
directed graph and coloring the directed edges. The performances are further enhanced by per- 
forming all the operations on blocks of size 4x4 since we are dealing with coupled systems. 

The memory requirements for the present algorithm are linear in n, the number of ver- 
tices. The implicit scheme requires three arrays of size 7x1 6n in addition to a few integer 
arrays of size n. One of these arrays stores the matrix A in the edge-based data structure, a 
second in the YSMP format which is suitable for the factorization and the third contains the 
L and the U factors. The factor 7 comes from having 3 times as many edges as vertices (valid 
for all 2-D triangular grids, neglecting boundary effects); we store two blocks per edge plus the 
diagonal matrix for all the vertices. The second array is reused for storing the search directions 
in GMRES, permitting up to 27 search directions to be stored. Block diagonal preconditioning 
dispenses with one of these arrays. 

The ordering of unknowns has a bearing on the convergence properties of many iterative 
methods. This is true for iterative methods which involve a directional bias such as the 
SSOR/ILU preconditioning. For structured meshes in [10,18] it was found that a column-major 
ordering which minimized the bandwidth (the "most local" ordering) yielded the best conver- 
gence rates. For unstructured meshes we have settled on the Reverse Cuthill-Mckee (RCM) 
ordering [15]. This is a standard ordering used in sparse direct methods to reduce fill-in, but it 
also appears to be the "most local” ordering. We have also tested orderings based on coordi- 
nates of the vertices (sorting the vertices by the x coordinates, y coordinates or some combina- 
tion of x and y coordinates). The RCM ordering gives marginally better convergence rates over 
a wide range of problems. RCM is also more efficient in that it creates fewer wavefronts, thus 
producing longer vectors. 

To achieve good overall vector performance, careful attention also needs to be paid to the 
assembly of the matrix. In the present set-up, the matrix assembly is performed by looping 
over the edges as far as possible. This is easily done for the inviscid fluxes and the first order 
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dissipative terms, but is quite involved for the full viscous fluxes. We have found it expedient 
to assemble the matrix for the viscous fluxes by looping over the triangles instead and coloring 
the triangles to achieve vectorization. The Jacobians are derived analytically, but with some 
approximations for the viscous terms as was discussed earlier. 

5. RESULTS AND DISCUSSION 

The iterative method outlined above requires a few parameters. The start-up CFL number 
and the maximum CFL number that can be used need to be specified. It is also possible to 
freeze the factorization after a few time steps (or after a prescribed reduction in the residual) 
and increase the efficiency of the code, since it eliminates the assembly and/or the factorization 
of the matrix. This is an additional parameter. GMRES requires a few parameters. It requires 
the maximum number of search directions k, the number of restart cycles m and a tolerance 
level which specifies the desired order of reduction of the residual of the linear sub-problem. 
The solution to the linear system is terminated when the number of iterations exceeds the 
specified maximum whether or not the tolerance criterion is met. In all the problems, the toler- 
ance is set to 10~ 5 . 

We first study a standard airfoil case, namely inviscid flow over the ubiquitous 
NACA0012 airfoil at a freestream Mach number of 0.8 at 1.25° angle of attack. The unstruc- 
tured grid contains 4224 vertices or 8192 triangles. A close-up of the nearly uniform grid is 
shown in Fig. la. The solution (not shown here) agrees with standard results. We obtain lift, 
drag and moment coefficients of 0.3523, 0.0226 and -0.0452 respectively. The convergence 
histories of five different methods arc shown in Fig. lb as a function of CPU time. Since we 
are dealing with different methods which require varying amounts of work at each time step 
we believe that CPU time is the only true measure for comparing them. Since there are quite a 
few parameters involved in each of thes e methods, what we have shown is the "best" conver- 
gence history obtained with each method. GMRES with ILU preconditioning (GMRES/ILU) 
uses 5 search directions, CFL 20- 10 6 and freezes the factorization after 30 time steps. 
GMRES/SSOR, wherein SSOR is used as the preconditioner, employs 15 search directions, 
CFL 20- 10 6 and freezes the matrix after 30 time steps. GMRES/DIAG, which uses block diag- 
onal preconditioner, employs 25 search directions with 3 restarts, CFL 10-500,000 and freezes 
the preconditioner after 25 time steps. The ILU iteration uses CFL 1-50 and freezes the matrix 
after 25 steps. Finally, the SSOR iteration uses CFL 1-25 and freezes the matrix after 30 time 
steps. Using multiple "inner" sub-iterations with the ILU and the SSOR iteration schemes in 
order to be able to use larger time steps turns out be less efficient for this problem. The 
number of time steps taken by GMRES/I LU, GMRES/SSOR, GMRES/DIAG, ILU and SSOR 
are 75, 100, 75, 700 and 700 respectively, the parameters given above for the five methods, 
we believe, are nearly optimal for this problem and yield the best convergence history for each 
of the methods. Having to choose many parameters is a major drawback in using iterative 
methods to solve the approximate linear systems arising from nonlinear problems. However, 
we will be able to provide some guidelines for choosing these parameters for the best of these 
methods, namely GMRES/ILU, by solving a few more representative problems. In Fig. lb, we 
notice that GMRES/DIAG is quite slow even for this simple problem, while ILU iteration 
appears to be quite good. SSOR iteration and GMRES/SSOR have similar convergence his- 
tories. SSOR as a preconditioner is not as effective as the ILU preconditioner, GMRES/ILU 
appears to be the best of all the methods. As we shall see, as the problems get bigger and 
more stiff, GMRES/ILU performs much better than the other four methods. 
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We next consider inviscid subcritical flow over a 4 element airfoil at a freestream Mach 
number of 0.2 and angle of attack of 5°. The triangular mesh employed has 10395 vertices. 
The grid is shown in Fig. 2a. The solution is not shown here and may be found in Mavriplis 
[4], In Fig. 2b we present the convergence histories of GMRES/ILU, GMRES/DIAG and ILU 
and SSOR iteration. GMRES/SSOR had great difficulties in the initial stages and is not shown. 
GMRES/ILU converges much better than the other methods. The parameters for GMRES/ILU 
are 10 search directions and CFL 20- 10 6 , the factorization being frozen after 30 time steps. 
GMRES/DIAG employs 25 search directions with 2 restarts, CFL 10-5X10 5 and freezes the 
preconditioner after 30 time steps. ILU iteration uses CFL 1-50, freezes the matrix after 50 
time steps and does not use sub-iterations. SSOR iteration uses CFL 0.5-5 and freezes the 
matrix after 100 time steps. The number of time steps taken by GMRES/ILU, GMRES/DIAG, 
ILU and SSOR are 100, 70, 400 and 400 respectively. SSOR, either by itself or as a precondi- 
tioner, is clearly unsatisfactory. 

We compare the performances of the methods on a transonic turbulent flow over an 
RAE2822 airfoil, referred to as Case 6. The flow conditions are M. = 0.729, a = 2.31° and 
Reynolds number 6.5 x 10 6 based on the chord. The flow is computed on a mesh with 13751 
vertices which contains cells in the boundary layer and the wake region with aspects ratios up 
to 1000:1. The grid is shown in Fig. 3a. The pressure plot and skin friction distribution and 
experimental data are shown in Figs. 3b and 3c. The lift, drag and moment coefficients are 
0.7342, 0.0132 and -0.0978. Fig. 3d shows the convergence histories of the various methods. 
We notice that only GMRES/ILU and GMRES/DIAG converge, the latter doing so much more 
slowly. GMRES/SSOR diverges for any reasonable CFL numbers at all and its convergence 
history is not shown. The parameters for GMRES/ILU are 25 search directions and CFL 5- 
25000. We freeze the factorization after 80 time steps. We also freeze the turbulence model 
after nearly six orders of reduction in the residual; otherwise, the residual hangs and the con- 
vergence of the method slows down. The effect of freezing the turbulence model in this 
fashion has minimal effect on the aerodynamic coefficients (less than 0.02% change in lift 
coefficient). The parameters for GMRES/DIAG are the same as for GMRES/ILU. The number 
of time steps taken by both GMRES/ILU and GMRES/DIAG is 150. The unstructured mul- 
tigrid algorithm of Mavriplis [4] takes nearly 300 secs, on the YMP to reduce the L 2 norm of 
the residual to .3 x 10 -3 and GMRES/ILU takes about 450 secs, to get to the same level (7 ord- 
ers of reduction in residual) for this problem. In the full multigrid algorithm, the problem is 
first solved on coarser grids, whereas GMRES/ILU starts from freestream conditions on the 
fine grid. The ILU and SSOR iterations use 10 sub-iterations, CFL .5-2.5 and still do not con- 
verge after 200 time steps. 

The final case computed is turbulent flow over a four-element airfoil computed on an 
adapted grid with 48691 vertices. The grid and a close-up view near the leading edge are 
shown in Figs. 4a and 4b. The flow conditions are = 0.1995, a = 16.02° and Reynolds 
number of 1.1 87x 10 6 . The convergence histories with and without freezing the turbulence 
model are shown in Fig. 4c as a function of the CPU time. The number of time steps taken is 
400. The multigrid algorithm takes 2100 secs, to reduce the residual to 1.79 x 1CT 2 while 
GMRES/ILU takes about 2000 secs, to reach the same stage (five orders of reduction of the 
residual). The computed Mach contours for this case are shown in Fig. 4d, illustrating the 
complexity of this flow. In Fig. 4e the computed surface pressure distribution is compared with 
experimental wind-tunnel data. 
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In summary, we have found that for inviscid flows 5-10 search directions are usually 
sufficient whereas the turbulent viscous cases require 25 search directions with GMRES/ILU. 

The start-up CFL number is usually about 20 for inviscid problems and about 5 for turbulent 
viscous cases and the CFL number is allowed to increase up to 500-50000 fold. We use non- 
restarted GMRES whenever possible, which eliminates one of the parameters and is better 
suited for stiff problems (see [12]). The GMRES/ILU runs at about 90-120 MFlops on the 
Cray YMP (uni-processor) at the NAS facility, with performance improving as the problems 
get larger. 

6. CONCLUSIONS 

We have compared five candidate implicit methods for solving the compressible Navier- * 

Stokes equations. For inviscid problems, with a small number of vertices and low cell aspect 
ratios, many of the methods work well, GMRES with ILU preconditioning performing the best. 

For larger problems, especially at high Reynolds numbers, almost all the methods except for i 

GMRES/ILU converge extremely slowly, if at all. Not surprisingly, SSOR, either an an itera- \ 

tion or as a preconditioner, suffers dramatically as the problem increases in size or in the 1 

degree of complexity. GMRES/ILU is quite competitive with the unstructured multigrid algo- | 

rithm, while eliminating the need For independent coarse grids to be generated. It does, how- 
ever, incur a larger memory overhead than the multigrid algorithm. Even though these methods 
have been compared for a particular spatial discretization, we believe the trends should hold for 
other discretizations as well. We have carried out a number of optimizations to extract the best 
vector performances out of all these methods. Finally, the turbulence model itself appears to 
inhibit convergence in the latter stages. This needs further investigation and perhaps incorporat- 
ing a field equation model with proper linearization would solve the problem. 
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Figure la 

Mesh for Computing Inviscid Flow over NACA 0012 Airfoil 
(Number of Vertices = 4224) 
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Figure lb 

Convergence Histories of Various Implicit Methods 
for Inviscid Row Over a NACA 0012 Airfoil 
(Mach = 0.8, Incidence = 1.25 degrees) 
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Figure 2a 

Mesh for Computing Inviscid Flow over a Four-Element Airfoil 
(Number of Vertices = 10,395) 
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Figure 2b 

Convergence Histories of Various Implicit Methods 
for Inviscid Flow Over Four-Element Airfoil 
(Mach = 0.2, Incidence = 5 degrees) 



Figure 3a 

Mesh for Computing Viscous Turbulen t Flow Over an RAE 2822 Airfoil 
(Number of Vertices = 13751) 
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Figure 3b 

Computed Surface Pressure Distribution for Viscous Turbulent Flow over RAE 2822 Airfoil 
(Mach = 0.729, Incidence = 2.31 degrees, Reynolds Number = 6.5 million) 
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Figure 3c 

Computed Skin Friction Distribution over RAE 2822 Airfoil 
= 0.729, Incidence = 2.31 degrees, Reynolds Number = 6.5 million) 
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Figure 3d 

Convergence Histories of Various Implicit Methods for Viscous Turbulent Flow 

over an RAE 2822 Airfoil 

(Mach = 0.729, Incidence = 2.31 degrees, Reynolds Number = 6.5 million) 
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Figure 4a 

Global View of Adaptively Generated Mesh for Computing Viscous Turbulent Flow over a Four-Element Airfoil 

(Number of Vertices = 48,691) 
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Figure 4b 

Close-up View in Region of Leading Edge of Adaptively Generated Mesh about Four-Element Airfoil 
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Figure 4c 

Convergence History of ILU-GMRES Implicit Scheme and Effect of Freezing 
the Algebraic Turbulence Model for Flow Over Four-Element Airfoil 
(Mach = 0.1995, Incidence = 16.02 degrees, Reynolds Number = 1.187 million) 
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Figure 4d 

Computed Mach Contours for Viscous Turbulent Flow Over Four-Element Airfoil 
(Mach = 0.1995, Incidence = 16.02 degrees, Reynolds Number = 1.187 million) 
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